Antipattern: Always Depend on One’s Parent
Let's take a look at some drawbacks of maintaining trees with an Adjacency List.
Let’s imagine a query in which there is a need to retrieve the parents or children of a node. What should be our solution to such a query?
The answer is Adjacency List.
Any solution that we would find for this problem would include modifying the code that manages the tree data structures. Let’s discuss it in detail.
Adjacency list#
The solution we might first think of, and that is commonly shown in books and articles, is to add a column parent_id
. This column references another comment in the same table, and the user can create a foreign key constraint to enforce this relationship.
This design is called Adjacency List. It’s probably the most common design software developers use to store hierarchical data.
The SQL to define this table is presented below.
The Adjacency List Entity-Relationship Diagram is shown below.
An illustration of the tree, “Threaded comments,” is shown below.
Querying a tree to find the immediate children of a node#
Adjacency List fails to be a solution for one of the most common tasks we need to do with a tree: query all descendants.
We can retrieve a comment and its immediate children using a relatively simple query:
In the results, we will see the data for comment on the left side and the information of its immediate child or children on the right side.
However, this queries only two levels of the tree. One characteristic of a tree is that it can extend to any depth, so we need to be able to query the descendants without regard to the number of levels. For example, we may need to compute the COUNT()
of comments in the thread or the SUM()
of the cost of parts in a mechanical assembly.
This kind of query is awkward when we use Adjacency List because each level of the tree corresponds to another JOIN
, and the number of JOIN
s in an
SQL query must be fixed.
Querying a tree to find the descendants of a node#
The following query retrieves a tree of up to four levels of depth but cannot retrieve the tree beyond that.
Note: There are no records for the fourth level, so we are seeing NULL in all those cells.
This query is also awkward because it includes descendants from progressively
deeper levels by adding more columns. This makes it hard to compute an
aggregate such as COUNT()
.